Similarity Measures for Writer Clustering
نویسنده
چکیده
JAYASHREE SUBRAHMONIA IBM T.J. Watson Research, P.O. Box 218 / Route 134, Yorktown Heights, NY 10598, U. S. A. E-mail: [email protected] This paper addresses the problem of improving the performance of an online, writer-independent, large-vocabulary, unconstrained, handwriting recognition system by clustering writers with similar writing styles. Recognition performance is enhanced by identifying the writer cluster that a test writer is closest to and using a model trained for the corresponding writer cluster in decoding. The recognition system is based on hidden Markov models. A common set of features are computed for all writers, which are then projected to a lower dimensional space that preserves most of the information in the original feature set. The reduced dimensional space varies from writer to writer. This paper describes two measures of similarity between writing styles. The rst is based on the distance between the writer-dependent reduced dimensional feature subspaces. The second is based on the hidden Markov Model output probabilities.
منابع مشابه
An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملWeighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering
Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...
متن کامل